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We describe QWalk, a new computational package capable of performing Quantum Monte Carlo 
electronic structure calculations for molecules and solids with many electrons. We describe the 
structure of the program and its implementation of Quantum Monte Carlo methods. It is open- 



source, licensed under the GPL, and available at the web site http://www.qwalk.org 



I. INTRODUCTION 

Solution of the stationary Schrodinger equation for in- 
teracting systems of quantum particles is one of the key 
challenges in quantum chemistry and condensed matter 
physics. In particular, many problems in electronic struc- 
ture of atoms, molecules, clusters and solids require the 
ground and excited eigenstates of the electron-ion Born- 
Oppcnhcimcr Hamiltonian 




where upper/lower cases indicate nuclei /electrons. Due 
to the Coloumb interaction, the eigenstates are very 
complicated functions in the in 37V e -dimensional space 
where N e is the number of electrons. Over the past six 
decades or so, physicists and chemists have developed 
many powerful approaches and theories that attempt to 
solve the electronic structure problem with varying de- 
grees of accuracy. Among these are the wave function 
methods such as Hartree-Fock (HF) and post Hartree- 
Fock (post-HF), and also methodologies which are based 
on functionals of electron density such as Density Func- 
tional theories (DFT). Because none of them are exact 
in practice, each of these methods occupies its place in 
the computational toolbox. DFT represents an excellent 
tradeoff between accuracy and computational efficiency, 
allowing thousands of electrons to be treated, usually get- 
ting qualitative trends correctly for many quantities and 
materials such as cohesive/binding energies, many (but 
not all) energy differences between different systems, and 
can even be quantitatively accurate for some quantities 
(such as geometries), especially for the systems of atoms 
from the first two rows of the periodic table. Many sys- 
tems and effects are, however, not accurately described 
(van der Waals systems, systems with transition metal 
atoms, many excitations, etc.) and require treatment of 
quantum many-body effects more accurately. One can 
turn to post-Hartree-Fock methods based on sophisti- 
cated expansions of wave functions in one-particle ba- 
sis sets. These methods can be made formally exact, 
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unfortunately, the computational cost is substantial and 
the most accurate approaches scale quite poorly with the 
system size, say, 0(N^~ 7 ). It is very difficult to find a 
method that scales well, at most O(N^), and also offer 
higher accuracy than DFT. 

Quantum Monte Carlo methods fill this gap by us- 
ing stochastic algorithms to treat the many-body wave 
function in the full 3iV e -dimensional space. It has sev- 
eral advantages-good scaling in the number of electrons 
(0(7Vg~ 3 ), depending on the quantity of interest) and is 
amenable to parallel implementations at 99% efficiency. 
Over the past ^20 years, QMC has been applied to a 
host of systems such as model systems, atoms, molecules 
and solids, with impressive accuracy across this wide 
range jUE]- For extended systems, particularly, it is the 
most accurate method available for total energies on the 
materials that have been tested. Since these calculations 
represent rather recent developments, the packages for 
QMC are currently in development and only few options 
are available for the community at large. We have devel- 
oped a new program QWalk for general purpose QMC 
calculations written in C++ with modern programming 
techniques and incorporating state of the art algorithms 
in a fast and flexible code. QWalk has already been used 
in several publications [3 |U [3 El [7J E] , and we would like 
to present a summary of its current capabilities. 



II. METHOD 

A. Variational Monte Carlo 

The expectation value for an arbitrary operator O and 
a given trial variational wave function is given by 

(n) = (grlgjgr) /*|.(R)[OttT(R)/ttr(R)]<fll 

where R = (ri, T2, rjy e ) denotes a set of N e electron 
coordinates in 3D space. Typically, such integrals are 
evaluated by reducing the multi-dimensional integral into 
a sum of products of low-dimensional integrals. Unfortu- 
nately, this either restricts the functional form of ^t(R) 
or makes the calculations undo-able for more than a few 
electrons. One of the key motivations for employing 
stochastic approaches is to eliminate this restriction and 
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to gain qualitatively new variational freedom for describ- 
ing many-body effects. 

In order to evaluate the expectation value integral 
stochastically we first generate a set {R TO } of statisti- 
cally independent sampling points distributed according 
to ^|,(R) using the Metropolis algorithm. The expecta- 
tion value is then estimated by averaging over the sam- 
ples {R m }. For example, the VMC energy is given by 
the average of the quantity called local energy 
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with the statistical error e proportional to 1/ \[M. 

It is straightforward to apply the variational theorem 
in this framework. Consider a variational wave function 
*S?t{R,P), where R is the set of all the electron posi- 
tions and P is the set of variational parameters in the 
wavef unction 



E(P) 



J * T (R, P)HV t (R, P)dR 



(2) 



A (hopefully) good approximation to the ground state is 
then the wavefunction with the set of parameters P that 
minimizes E{P). The stochastic method of integration 
allows us to use explicitly correlated trial wave functions 
such as the Slater-Jastrow form, along with other func- 
tional forms as explained later. In fact, as long as the trial 
function and its derivatives can be evaluated quickly, any 
functional form can be used. 

Within the program, this procedure is broken down 
into two parts: sampling while evaluating energy 
and other properties, and optimizing the wave function. 
The first part, sampling "J 5,, is carried out using the 
Metropolis-Hastings [5J |TU] algorithm. We start with 
a point R in 37V e dimensional space and generate a 
second point R' according to the transition probability 
T(R' <— R). T is a completely arbitrary function as long 
as T(R <- R) ^ T(R <- R') ^ 0; that is, all moves 
are reversible. We then accept the move with probability 



vnin 1 



^(R')T(R' <- R) 
tf^(R)T(R <- R') 



(3) 



After a few steps, the distribution converges to and 
we continue making the moves until the statistical uncer- 
tainties are small enough. For atoms with effective core 
potentials, we use the moves as outlined in Ref. pQ, mod- 
ified with a delayed rejection step similar to Ref. [IT], 
although developed independently, and for full-core cal- 
culations, we use the accelerated Metropolis method from 
Ref. [12]. The total energy and its components are eval- 
uated, as well as other properties. 

We then optimize the wave function using a fixed set 
of sample points. Since the samples are then correlated, 



small energy differences can be determined with much 
greater precision than the total energy. There are many 
quantities other than energy that, upon being minimized, 
will provide a good approximation to the ground state 
wave function. One important one is the variance of the 
local energy; that is 



J dR^ 2 T (R)(E loc - (E loc )f 
J dR*|(R) 



(4) 



Since Ei oc is a constant when \^t) = \®o), the variance 
will go to zero for an exact eigenstate. There are several 
other possible functions, listed in Sec. |IV B| but variance 
and energy are the most common quantities to minimize. 



B. Projector Monte Carlo 

To obtain accuracy beyond a given variational ansatz, 
we employ another method which projects out the ground 
state of a given symmetry from any trial wave func- 
tion. To do this, we simulate the action of the operator 
e~(- H_ - E °) r on the trial function, where r is the projection 
time and Eq is the self-consistently determined energy of 
the ground state. As r — > 00, e~( H ~ E °' Ti $iT — * ^o, where 
$0 is the ground state. For large t, there is no general 
expansion for e~( H ~~ E °^ T , but for small r, we can write 
the projection operator in R-representation as 

G(R',R,r) ~ exp(-(R' - R) 2 /2r) 

x exp(~(V(R) + V(R!)-2E )) 

which can be interpreted as a dynamic diffusion kernel 
Gd(R',R,t) = exp{— (R' — R) 2 /2t) times a branching 
kernel G b (R',R,t) = exp(-|(V(R) + V( R ') ~ 2E o))- 

The basic idea of projector Monte Carlo is to sample 
a path G , (RAr,R7v-i,T)...G(R 2 ,Ri,T)* T (Ri). For N 
large enough (for a long enough path), the distribution 
of Rjy will approach $0. However, to interpret this as 
a stochastic process, the path distribution must be pos- 
itive; that is, the product of all G's with vf^ must be 
positive. This gives rise to the fixed node approxima- 
tion [Tj2 |TU [T5] [TO] , where the nodes (the places where 
the trial function equals zero) of the trial wave function 
are used as approximation to the nodes of the ground 
state wave function. One can avoid this restriction by 
performing a released-node calculation [T7] , although the 
price is a change from polynomial to exponential scaling 
with system size. With the nodal constraint, the pro- 
jector Monte Carlo approach typically obtains 90-95% of 
the correlation energy in an amount of time proportional 
to N" where a = 2, 3 depending on actual implementa- 
tion and type of the system. In what follows we there- 
fore assume that the fixed-node condition is enforced and 
therefore $0 is the antisymmetric ground state for a given 
fixed-node boundary condition. 

In actual calculations, we perform an importance- 
sampling transformation, where G(R',R, r) is replaced 
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by the importance sampled Green's function 

G(R, R, t) = * T (R')G(R', R, r) /* T (R) (5) 
The dynamic part of the Green's function then becomes 
Gb(R',R,t) = ea;p(-(R'-R-TV/n^ T (R)) 2 /2T) (6) 
and the branching part becomes 

Gb(R',R,t) = exp(~(E L (R) +E L (TL') - 2E )), (7) 

both of which arc much better-behaved stochasti- 
cally, since the 'force' VZnWj^R) biases the walk 
to where the wavefunction is large, and the lo- 
cal energy _Z?l(R) is much smoother than the 
potential energy. Then if we generate the path 
G(Rjv,R J v-i,t)...G'(R 2 ,Ri,t)*2,(R 1 ), for large 
enough N, the distribution of R^r is $r(Rjv)$o(Rjv)i 
which is called the mixed distribution. The ground 
state energy is obtainable by evaluating the integral 
J ^ T ^ H^ T /^ T dIl = J <& Hq> T dIl = E , since <& 
is an eigenstate of H within the nodal boundaries. 
In QWalk, two versions of the projector method are 
implemented: Diffusion Monte Carlo, which has the 
advantage that the large N limit is easily obtained, 
and Reptation Monte Carlo, which makes the 'pure' 
distribution <5>q available. 

Diffusion Monte Carlo has been discussed by many au- 
thors [H[Hj, and suffice it to say that it attains the mixed 
distribution by starting with a distribution of H>j, and in- 
terpreting the action of the Green's function as a stochas- 
tic process with killing and branching, eventually ending 
up with ^t^o- K has the advantage that the r — * oo 
limit is easy to achieve, but the disadvantage of not hav- 
ing access to the pure distribution. A more subtle limita- 
tion is that the branching process spoils any imaginary- 
time data and can decrease the efficiency of the simu- 
lation if there is too much branching. Even with these 
limitations, in current implementations DMC is probably 
the most efficient way to obtain the fixed-node approxi- 
mation to the ground state energy. 

For quantities that do not commute with the Hamilto- 
nian, we use Reptation Monte Carlo [18] with the bounce 
algorithm [Tj5]. We sample the path distribution 

n(s) = * T (R )G(R , R u t)... G(R„_ 1; R„, r)^ r (R n ) 

(8) 

where s = [Ro,Ri, . . . ,R„-i,R n ] is a projection path. 
In the limit as t — > oo, exp(— Ht)\^t) —> |^o)> 
the ground state, and, since it is a Hermitian opera- 
tor, the conjugate equation also holds. Therefore, the 
distribution of Ro and R„ is the mixed distribution 
^T(R) < i ) o(R) I and the distribution of R„/ 2 is <f>§(R„/ 2 ) 
in the limit as n — > oo. We evaluate the energy as 
Ermc = ([E l (Rq) + E L (R N )]/2) and operators non- 
commuting with H as Ormc = (G(Rjv/2)) Reptation 
Monte Carlo does not include branching, instead it uses 
an acceptance/rejection step. This is a tradeoff, allow- 
ing us to project only for a finite r, since otherwise the 



TABLE I: The central objects of the code and their physical 
correspondents 



Module name Mathematical object 

System parameters and form of the Hamiltonian 

Sample point R, the integration variables 

Wave function type Wave function ansatz 
Wave function *t(R), V*t(R), V 2 *t(R) 

Dynamics generator Metropolis trial move 
(Green's function) 



probability distribution function is not normalizable, but 
allowing access to the pure distribution and imaginary 
time correlations. The path can sometimes get stuck due 
to rejections even with the bounce algorithm, which is a 
well-known limit on the efficiency of the algorithm. In 
QWalk, RMC is approximately as efficient as DMC un- 
til the rejection rate begins to increase, making the path 
move very slowly. In our current implementation we em- 
pirically find that this slowdown occurs at approximately 
150 electrons, although it also depends on the quality of 
the trial wave function. 



III. ORGANIZATION AND IMPLEMENTATION 

The code is written in a combination of object-oriented 
and procedural techniques. The object-oriented ap- 
proach is coarse-grained, creating independent sections 
of code that are written efficiently in a procedural fash- 
ion. It is extremely modular; almost every piece can be 
removed and replaced with another. A contributor of a 
module only has to change one line in the main code to al- 
low use of a new module. This allows for flexibility while 
keeping the code base relatively simple and separable. 
The modular structure also allows for partial rewrites of 
the code without worrying about other parts. In fact, 
each major module has been rewritten several times in 
this manner as we add new features and refactor the code. 
For the user, this structure shows itself in flexibility. 

The modules form a tree of successive abstractions 
(Fig. [TJ). At the top of the tree is the QMC method, 
VMC in this case. It works only in terms of the objects 
directly below it, which are the concepts of System, Wave 
function data, etc. (see Table [I]). These in turn may have 
further abstractions below them, as we've shown for the 
wave function object. The highest wave function object 
is of type 'Multiply', which uses two wave function types 
to create a combined wave function. In this case, it mul- 
tiplies a Slater determinant with a Jastrow correlation 
factor to form a Slater- Jastrow function. Since the wave 
functions are pluggable, the Slater determinant can be 
replaced with any antisymmetric function, as well as the 
Jastrow factor. The type is listed along with the spe- 
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Method 
(VMC) 




System 
(Molecule) 



Most Abstraction 



Wavefunction_data 
(Multiply) 



Dynamics generator 
(Delayed Rejection) 



Wavef unctiondata 
(Slater Determinant) 



Wavefunction_data 
(Jastrow factor) 



Molecular orbital 
evaluatorf Linear) 




One-body Jastrow 



Two-body Jastrow 



Basis_function 
(Gaussian) 



Basis_function 
(cutoff Pade) 



Basis_function 
(cutoff Pade) 



T 

Least abstraction 



FIG. 1: Calculation structure for the VMC method on a molecule using a Slater- Jastrow wave function. 



cific instant of that type in parenthesis. At each level, 
the part in parenthesis could be replaced with another 
module of the same type. 

We present an implementation of the VMC algorithm 
as an example of how the code is organized (Fig. |2j. For 
reasons of space, we do not write the function line- by- 
line, which includes monitoring variables, etc., but in- 
stead give a sketch of the algorithm. The VMC method 
works at the highest level of abstraction, only in terms 
of the wave function, system, and random dynamics. It 
does not care what kind of system, wave function, etc. 
are plugged in, only that they conform to the correct in- 
terfaces. In Appendix [AJ we give an example of how to 
create a new module. 

We will now provide a listing of the available modules 
for the major types, along with some details of their im- 
plementation. 



IV. METHODS 

A. Variational Monte Carlo 

The VMC module implements the Metropolis method 
to sample the probability density Wj,(R). It has been de- 
scribed in Sec. Ill Al to some dctail-thc method is more or 
less a direct translation. Beyond the basic algorithm, it 
implements correlated sampling as explained in Sec. |IVE| 
for small energy differences between very similar systems. 



Vmc_method: :run(vector <string> & vmc_section, 

vector <string> & system_section, 

vector <string> k wavef unction_section) { 

//Allocate the objects we will be working with 

System * sys=NULL; 

allocate (sys , system_section) ; 

Wavef unction_data * wfdata=NULL; 

allocate (wf data, sys, wavef unction_section) ; 

Sample_point * sample=NULL; 
sys->generateSample (sample) ; 
Wavefunction * wf=NULL; 
wf data->generateWavef unction (wf ) ; 

//the Sample_point will tell the Wavefunction 
//when we move an electron 
sample->attachWavef unction (wf ) ; 
sample->randomGuess ; 

//This is the entire VMC algorithm 
for(int s=0; s< nsteps; s++) { 

for(int e=0; e < nelectrons ; e++) { 

dynamics_generator->sample (e , timestep,wf , sample) ; 

} //end electron loop 

//gather averages 
} //end step loop 

//report final averages 



FIG. 2: Simple VMC code 
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TABLE II: Optimization objective functions implemented 



Function Minimized quantity 

Variance {(El(R) - E rcl ) 2 ) 

Energy (£l(R)) 
Mixed aEnergy + (1 — a)Variance,0 < a < 1 
Absolute value (\E L (R) - E rEf \) 

Lorentz {ln(l + (E L (R) - E ref ) 2 /2)) 



B. Optimization of Wave Functions 

We have implemented three different methods for op- 
timization. All methods are capable of optimizing the 
first three objective functions from Table [II) In principle, 
any objective function from this table will obtain the cor- 
rect ground state with an infinitely flexible function, but 
may obtain different minima for incomplete wave func- 
tions and some are easier to optimize than others. The 
first (OPTIMIZE) is based on Umrigar et al.'s [20] vari- 
ance optimization. The method minimizes the objective 
function on a set of fixed configurations from VMC using 
a conjugate gradient technique, usually not reweighting 
the averages as the wave function changes. Optimizing 
the energy using OPTIMIZE is quite expensive, because 
it requires many configurations to evaluate an unbiased 
estimate of the energy derivative. 

The next two are based on Umrigar and Filippi's New- 
ton optimization [ST] method. OPTIMIZE2 also uses a 
fixed set of configurations, but instead of evaluating only 
the first derivatives of the objective function, as conju- 
gate gradients do, it uses a low-variance estimator for the 
Hessian matrix and Newton's method to find the zeros of 
the first derivatives. OPTIMIZE2 is able to produce bet- 
ter wave functions with lower energies than OPTIMIZE 
by directly optimizing the energy even for very large sys- 
tems (we have applied it for up to 320 electrons) while 
costing slightly more. 

Finally, NEWTON_OPT uses a fixed set of configu- 
rations to calculate the same low-variance estimator for 
the Hessian matrix only at the single step, then evaluates 
the optimal length of the optimization step using VMC 
correlated sampling 21 J. The later step enables us to de- 
crease the number of iterations needed to converge. Fur- 
ther, this method is able to find the very lowest energy 
wave function, since the configurations are regenerated 
at every optimization step. However, the expense of one 
iteration in NEWTON_OPT is larger than for other two 
methods due to the additional cost associated with VMC 
and VMC correlated sampling. 

C. Diffusion Monte Carlo 

DMC is implemented almost identically to VMC, 
except that the time step is typically much smaller 



and each walker accumulates a weight equal to 
exp(~^ft(E L (R') + E L (R) - 2E ref )). Since we use 
an acceptance/rejection step, r e ff is chosen somewhat 
smaller than t as r e f f — pr, where p is the acceptance ra- 
tio. To control the fluctuations in the weights, we employ 
a constant-walker branching algorithm, which improves 
the parallel load balancing properties of DMC. Every few 
steps we choose a set of walkers that have large weights 
(u>i) for branching. Each one of these walkers is matched 
with a smaller weight walker (1^2) which is due for killing. 
The large weight walker is branched and the small weight 
walker is killed with probability w ^ w , with each copy 
gaining a weight of Wl + W2 , Otherwise, the small weight 
walker is branched and the large weight walker is killed, 
with the copies having the same weight as before. Walk- 
ers are then exchanged between nodes to keep the number 
of walkers on each node constant, and thus preserve high 
parallel efficiency QWalk keeps track of two numbers: 
E re f and Eq. E re f is first set to the VMC average en- 
ergy, and then to the energy of the last block. The energy 
that goes into the weights, Eq, is then calculated every 
few steps as 

E = E ref - log , (9) 



where N con f is the number of sample points (configura- 
tions) in the simulation. 

During the DMC calculation, the local energy will 
very occasionally fluctuate down significantly, causing 
the weight to increase too much. Of course, this is very 
much dependent on the quality of the trial function and 
the studied system. This can be fixed by cutting off the 
weights. For fluctuations beyond ten standard deviations 
of the energy, we smoothly bring the effective time step 
to zero for the weights, which avoids the efficiency prob- 
lem without introducing a noticeable error. The bias due 
to this cutoff goes to zero as the time step goes to zero 
or as the trial function approaches the exact one. 



D. Reptation Monte Carlo 

The fluctuations in the local energy part of the Green's 
function can cause the path in RMC to get stuck, so we 
cut off the effective time step in the same way as in DMC. 
The branching part of the Green's function is otherwise 
quite smooth. We use the same dynamic Green's func- 
tion as we do in DMC (either a standard Metropolis rejec- 
tion step or the UNR [T2] algorithm), so we accept/reject 
based only on the branching part of the Green's function. 
We use the bounce algorithm [19] , which improves the ef- 
ficiency by allowing the path to explore the many-body 
phase space much more quickly. 
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E. Correlated Sampling 



B. Pseudopotentials 



Correlated sampling is a technique where one samples 
two very similar systems with the same sets of sam- 
ples. The variance in the difference will decrease as 
Var(X — Y) = Var(X) + Var(Y) - 2Cov(X, Y), so for 
perfectly correlated sampling, the variance will be zero 
for the difference. In QWalk, this is handled by per- 
forming a primary walk that samples some probability 
distribution Pi(X). Averages are obtained as usual by 
calculating the integral (Oi) = J Pi(X)0\dX . Suppose 
we wish to find (0 2 — Oi). It can be written as 



J p 2 (x)o 2 -p 1 (x)o 1 dx = I P 1 (X) 



o 2 -o 1 



dX. 
(10) 

Since we are sampling Pi(X), in the Monte Carlo averag- 
ing, this integral is evaluated by averaging the weighted 
difference over sample points: 



QWalk accepts pseudopotentials as an expansion of 
nonlocal angular momentum operators: 



V, 



ECP 



V, 



local 



:r) 



[.max 

E 



Vi(R)\l)(l\ 



(12) 



for arbitrary maximum angular moment. Vi is a basis 
function object that is typically a spline interpolation of 
a grid or a sum of Gaussian functions. While any pseu- 
dopotential of this form can be used, we use soft poten- 
tials in which the ^ divergence has been removed from 
the nuclei-electron interaction. These potentials have 
been created specifically for QMC and are available in 
the literature H7J HE] , although more traditional 

Hartree-Fock or DFT pseudopotentials in the Troullicr- 
Martins form work as well. 



N 

E 



W ,(A t )Q 2 (X t ) Oi(Xj) 
EiMXi) N 



(11) 



The difference in the methods is only in how they deter- 
mine the weights. 

VMC, DMC and RMC all support correlated sampling 
between arbitrary systems. In VMC, the weights are 



w(X) 



, which is an exact relationship. DMC and 



RMC both require some approximation to the Green's 
function to weight the secondary averages properly. In 
both, we use the approximation of Filippi and Umri- 
gar [22], who discuss the subject in a greater detail. 



V. SYSTEMS 

A. Boundary Conditions 

Most systems of interest are treatable either by open 
boundary conditions or periodic boundary conditions. 
Adding new boundary conditions is also quite simple. 
Molecules with arbitrary atoms, charge, spin state, and 
with finite electric field are supported. In 3D periodic 
systems, the calculation can be done at any real k-point, 
allowing k-point integrations. In many-body simulations, 
there is an additional finite size approximation due to 
the Coulomb interaction between image electrons. We 
correct this as 5E — — , where the r s is that of the ho- 
mogeneous electron gas and c has been empirically fitted 
to 0.36 Hartrees. We have found this correction to func- 
tion about as well as other attempts to correct the finite 
size error [23j [24] . The code has been used on systems 
with up to 135 atoms and 1080 electrons; the limiting 
factor is the amount of computer time needed to reduce 
the stochastic uncertainties. 



VI. FORMS OF THE WAVE FUNCTION 

For chemical problems, the first-order trial function is 
usually written as a single Slater determinant of Hartree- 
Fock or Density Functional Theory orbitals multiplied by 
a correlation factor (known as a Jastrow factor) which is 
optimized in Variational Monte Carlo. Between 90% and 
95% of the correlation energy is typically obtained with 
this trial wave function in Diffusion Monte Carlo. 

One of the attractions of QMC is that, since all the 
integrals are done by Monte Carlo, almost any ansatz 
can be used, as long as it is reasonably quick to evalu- 
ate. QWalk's modular structure makes adding new wave 
function forms as simple as coding one-electron updates 
of the function value and derivatives, and adding one line 
to the main code to support loading of the module. We 
have implemented several forms of wave functions, which 
the user can combine. For example, to create the Slater- 
Jastrow wave function, the user first asks for a multiply 
object, which contains two wave function objects. The 
user then fills in a Slater determinant object and a Jas- 
trow object. For a Pfaffian-Jastrow wave function, the 
user replaces the Slater determinant input with the Pfaf- 
fian input. Obviously, it is up to the user to make sure 
that the total wave function is properly antisymmetric 
and represents the problem correctly. 



A. Slater Determinant (s) 



This is the standard sum of Slater determinants, writ- 
ten as = ^Z c iD\D\, where is a determinant of 
the spin up (down) one-particle orbitals. The weights of 
the determinants {c{\ are optionally optimizable within 
VMC. 
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B. Jastrow Factor 



antisymmetric matrix 



The Jastrow factor is written as 



where 



ilk ijk 

+ c kT m [ a k(rii)ai(r jI ) + a k (rji)ai(rii)]b m (rij), 

ijlklm 

(13) 

i,j are electron indexes, and / is a nuclear index. Both 
the coefficients and parameters within the basis func- 
tions can be optimized. In addition, the {c ee } and {c eel } 
coefficients can be made spin-dependent. For the ba- 
sis functions, we satisfy the exact electron-electron cusp 
conditions with the function b(r) = cp(r / rcut) / (1 + 
7p(r /rcut)), where p{z) = z — z 2 + z 3 /3, 7 is the cur- 
vature, which is optimized, and c is the cusp (1/4 for like 
spins and 1/2 for unlike spins). Further correlation is 
added by including functions of the form b^ir) = a^(r) = 

l+pzMr/rcut) where zpp(x) = x 2 (6 - 8x + 3x 2 ) and is 
an optimized parameter. These functions have several 
favorable properties, going smoothly to zero at a finite 
cutoff radius, and covering the entire functional space 
between and rcut. This allows the Jastrow factor to be 
very compact, typically requiring optimization of around 
25 parameters while still coming close to saturating the 
functional form. While these are the standard basis func- 
tions, they can be replaced or augmented by any in the 
program by a simple change to the Jastrow input. The 
third term in Eq. (13 1, which sums over two electron in- 



dexes and ionic indexes, can be expensive to evaluate 
for large systems and is sometimes excluded. A Jastrow 
factor with only the first two terms is called a two-body 
Jastrow, and with the eei term included is called a three- 
body Jastrow. 



PF 



Pf 



<j>U 

£•1-1- 



IT 



(14) 



where ^T(ll) anc j ^TU) represent the following 

block matrices. The is the singlet pairing matrix 
from above BCS wave function, the y>^U) includes addi- 
tional unpaired one-particle orbitals for a spin-polarized 
system. Finally, the ^T(ll) are antisymmetric triplet 
pairing matrices. The operation of the Pfaffian ensures 
that the entire wave function is antisymmetric. The Pfaf- 
fian wave function contains the BCS wave function as a 
special case without triplet pairing, and therefore con- 
tains the Slater determinant wave function as well. The 
general expansion for $ is 



= ^T / c k itpi(r 1 )<pj(r 2 ) 



(15) 



kl 



under the constraint that c^i = cik- £ is written in a very 
similar way: 

£n(u) (ri)r2) = £4j(uyji) (riV Ta) (r2) (16) 



kl 



TT(U) 

kl 



= -d 



TT(U) 

Ik 



The sum 



under the constraint that d 
extends over the space of occupied and virtual orbitals. 
All pairing functions as well as unpaired orbitals are 
fully optimizable within VMC method. The extensions 
of Pfaffian pairing wave function to linear combinations 
of Pfaffians with one or many sets of different pairing 
functions are also fully implemented. For more informa- 
tion about the performance and implementation of the 
Pfaffian wave function, sec Refs. l4l 1301. 



D. Backflow Correlated Wave Function 



C. Pfaffian Pairing Wave Function 

Pairing wave functions with a Jastrow factor for 
molecules were first investigated by Casula and cowork- 
ers |29j . who studied the constant number of parti- 
cles projection of the BCS wave function. The general 
Jastrow-BCS pairing wave function can be expressed as 
= e det[«I>], where e u is the Jastrow factor of above 
and the matrix ffr^ = (f>(r.i,rj) is the pairing function 
between opposite-spin electrons (the function is easily 
extended for N up ^ Ndown)- This function contains 
the Slater determinant as a special case (for singlet spin 
state) when <f> is written as the sum over the occupied 
single-particle orbitals: ^(r^r,) = Y,k" <Pk(n)tPk{rj)- 
We have implemented the Pfaffian jj] pairing wave func- 
tion, which allows not only unlike-spin pairing, as the 
canonical projection of the BCS wave function does, but 
also allows like-spin pairing. The general Pfaffian pair- 
ing wave function ^ pf is written as the Pfaffian of the 



Another way to systematically improve the nodal 
structure of trial wave function is through the introduc- 
tion of backflow transformation |3"T1 13"2"l 1531 15411331 13"51 
|3T1 13HJ • Given a trial wave function of form 'I't(R) = 
^^(R) x exp[C/(R)], the nodal structure is completely 
defined by the nodes of its antisymmetric part ^^(R). 
The backflow transformation replaces 'I'a(R) by ^^(X), 
where X = (xi,X2,...) are some quasi-coordinates de- 
pendent on all electron positions R, such the overall an- 
tisymmetry is preserved. The nodes of ^^(X) can then 
differ from nodes of '^(R) and improve the fixed- node 
approximation. 

The QWalk implementation of the backflow transfor- 
mation into Slater and Pfaffian wave functions closely fol- 
lows the approach in Refs. [36, 39 . The quasi-coordinate 
of ith electron at position is given as 

Xi =r i + € < (R) 

= r. 1 + ^r(R) + C(R) + C"(R), (17) 
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where £j is the ith electron's backflow displacement 
divided to the contributions from one-body (electron- 
nucleus), two-body (electron-electron) and three-body 
(electron-electron- nucleus) terms. They can be further 
expressed as 



cr(R)=£ 



Ilk 



^r n (R) = 



J2 c k a k(ni) 

k 



(18) 
(19) 



E 



c fcfm [ a fc,i-*i a /,i-ji + Ofc,r 3 -i Oi,rji] &m,7-ij 

/dm 



klra 



ru, 
(20) 



where = r j — r j and r jj = r ; — rj . The terms in the 
large square brackets are identical to our familiar one, 
two and two three-body Jastrow terms from Eq. (13 1. 



The implementation of backflow transformation there- 
fore takes great advantage of already existent Jastrow. 
The improvement in nodal structure and gains in cor- 
relation energies can be achieved by optimizing all the 
Jastrow parameters within backflow transformation. For 
more details about implementation and performance of 
backflow transformation in QWalk see Ref. [50] , 



Choose system, pseudopotential, 



T 



Generate one-particle orbitals 
(GAMES S, CRYSTAL, etc) 



J 



Convert orbitals and system to 
QWalk format 



T 



Iterate until energy is 
converged (2-3 iterations) 



Evaluate energy and generate 
configurations using VMC 



T 



Optimize Jastrow using VMC 
configurations 



T 



VMC to generate initial configurations 
for DMC or RMC 



J 



DMC or RMC calculation 



FIG. 3: Flow of a QMC calculation 



and adds to each molecular orbital the coefficient times 
the value of that basis function using fast BLAS routines. 



VII. ONE-PARTICLE ORBITAL EVALUATION 

We provide two major ways of evaluating the one- 
particle orbitals, the most expensive part of the QMC 
calculation. For a single electron, this is the problem of 
finding fa = M or i,b, where m is a vector of the values of 
each orbital, M or b is the orbital coefficient matrix, and b 
is the vector of basis functions. The first (CUTOFF MO) 
is a linear scaling technique, which, for localized orbitals 
and large enough systems, will take O(N) time to evalu- 
ate all orbitals for all electrons. For each basis function, 
it creates a list of orbitals for which the coefficient is 
above a cutoff. This is done at the beginning of the cal- 
culation. Then, given an electron position, it loops over 
only the basis functions within range of the electron, and 
then only the orbitals contributed to by the basis func- 
tion. These are both 0(1) cost for large enough systems, 
so all the orbitals for each electron is evaluated in O(l) 
time, giving O(N) scaling. 

The second method (BLAS_MO) is slightly simpler. 
While it scales in principle as 0(N 2 ), it can be faster 
than CUTOFF JVIO in medium-sized systems and certain 
types of computers that have very fast BLAS routines, 
such as Itaniums. Given an electron position, it loops 
through the basis functions within range of the electron, 



VIII. EXAMPLE CALCULATION 

To give a feeling for the flow of the program, we will go 
through a simple calculation. A schematic of the proce- 
dure is given in Fig. [3] The first two steps are to choose 
the system and use a one-particle code such as GAMESS 
or CRYSTAL to prepare the one-particle orbitals, which 
is done as usual for the code. The converter program in- 
cluded with QWalk then creates the system, slater, and 
jastrow files automatically, so all the user must do is use 
the include directive to use them. In Fig. |4] we evaluate 
the properties of the starting Slater wave function by cre- 
ating 500 electron configurations, and then propagating 
them for 16 blocks, each of which consists of 10 Monte- 
Carlo steps with a time step of 1.0 a.u. The final set of 
configurations is then stored in 'configfile', and QWalk 
outputs the total energy and other properties that have 
been accumulated along the way. 

We then wish to obtain some correlation energy by 
adding the Jastrow factor (Fig. [5]). The converter has al- 
ready created a null Jastrow wave function, so we request 
a Slater-Jastrow wave function. The first wave function 
is the Slater determinant that we used before, and the 
second is the Jastrow created by the converter. We re- 
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#load the converted pseudo-nuclei, number of electrons 
include sysfile 

#load the Slater determinant of one-particle orbitals 
trialfunc { include slaterfile } 



method { VMC 
nconfig 500 
nblock 16 
nstep 10 
timestep 1.0 



#number of configurations 
#averaging blocks 
#steps to take per block 
#timestep to use 



storeconfig configfile #save configurations to 
#a file 



FIG. 4: Example input file for VMC evaluation of properties. 
This corresponds to the fourth box in Fig. [3] 



include sysfile 

#load the wavefunction file generated by OPTIMIZE 
trialfunc { include optf ile . wf out } 

#Same as above 
method { VMC 

nconfig 500 

timestep 1.0 

nstep 10 

nblock 16 

readconfig configfile 
storeconfig configfile 



#perform DMC 



include sysfile 



method { DMC 
nconfig 500 
timestep 0.02 
nstep 50 
nblock 16 
readconfig configfile 
storeconfig configfile 

} 



#smaller timestep 
#more steps per block 



trialfunc { 
#a meta wave function that 
#multiplies two wave functions 
multiply 

wfl { include slaterfile } 



FIG. 6: Example input file for evaluation of properties of 
the correlated wave function, plus a DMC calculation. This 
corresponds to the sixth and seventh block in Fig. [3] 



#Jastrow correlation factor (created by converter) 
wf2 { include jastrowfile } 



IX. OTHER UTILITIES 



method { OPTIMIZE 
nconfig 500 
iterations 30 

#read in the configurations from 
#the previous VMC run 
readconfig configfile 



FIG. 5: Example input file for optimization of variational 
parameters. This is the fifth box in Fig. [3] 



A. Conversion of One-particle Orbitals 

Currently, QWalk can import and use the orbitals 
from GAMESS [ID] (gaussian basis on molecules), 
CRYSTAL [41] (gaussian basis for extended systems), 
SIESTA gg, and GP 03] (plane waves for extended sys- 
tems). The GP interface is not currently available for 
distribution due to licensing issues. More interfaces are 
planned, and are quite easy to add. 



Plane Wave to LCAO converter 



quest optimization using a fixed set of walkers that we 
generated in the previous VMC run. 

Finally, we wish to evaluate properties of the new cor- 
related wave function using the VMC routine (Fig. [6|. 
This is the same as the last, except that we include the 
output wave function from the optimization in the trial- 
func section. Also in the example, we perform a DMC 
calculation immediately after the VMC calculation. Its 
input is nearly identical to that already discussed. 



Gaussian basis sets have been used in quantum chem- 
istry for years and have been developed to the point that 
there are well-defined sets which saturate the one-body 
Hilbert space surprisingly quickly. They are localized, 
which improves the scaling of QMC, and allow a very 
compact expression of the one-particle orbitals, so less 
basis functions need to be calculated. Overall, a gaus- 
sian representation can improve the performance of the 
QMC code by orders of magnitude over the plane-wave 
representation. We have developed a simple method to 
do this conversion that is fast and accurate. We start 
with the plane-wave representation of the A:-th orbital 
&k(f) = X^g c fcG e G^' anc ^ w i sn to find the LCAO equiv- 
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Number of processors 



FIG. 7: Scaling of QWalk code over processors in Monte Carlo 
steps per second. The system is a 2x2x2 cell of BaTi03 with 
320 electrons and one walker per node, on the San Diego 
Supercomputing Center DataStar machine. This is VMC; 
DMC is very much the same, because of the constant walker 
algorithm. This is close to a worst-case scenario for QMC, 
since the run was only approximately 40 seconds long. 



alent <fr^ CAO (f) — 52 • akj<j>j(r), where eg is a plane- wave 
function and <pj is a Gaussian function. Maximizing the 
overlap between and ^u CAO , we obtain Sa^ — Pc^, 



where Si 



and P iS 



>i|eg). Then the Gaus- 
sian coefficients are given as = S~ x Pck- All the over- 
lap integrals are easily written in terms of two-center in- 
tegrals for S, and P is easily evaluated in terms of a 
shifted Gaussian integral. The limiting part of the con- 
version is the calculation of the inverse of S, which can 
be done with fast LAPACK routines. 



X. CONCLUSION 

QWalk is a step forward in creating a state of the art, 
usable, and extensible program for performing Quantum 
Monte Carlo calculations on electronic systems. It is able 
to handle medium to large systems of electrons; the max- 
imum size is mostly limited by the available computer 
time. It works in parallel very efficiently (Fig. [7|, so it 
can take advantage of large clusters, multi-core comput- 
ers, etc. Since QWalk is available without charge and 
under the GNU Public license, it is hoped that it will 
help bring both development and use of Quantum Monte 
Carlo methods to a wide audience. Due to its mod- 
ular form it is straightforward to expand the QWalk 's 
applicability to quantum systems beyond the electron- 
ion Hamiltonians in continuous space such as models of 
BEC/BCS condensates and other quantum models. It is 
easy to modify the system module to incorporate other 
types of interactions and to expand the one-particle and 
pair orbitals using the coded basis functions. 
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APPENDIX A: ADDING A MODULE 

We provide an example of how to add a new module. 
In this case, we look at a Basis_function object, which has 
the fewest functions to fill in. All modules can be added 
in exactly the same way, differing only in what functions 
need to be defined. Suppose we wish to add a Gaussian 
function exp(— ax 2 ). First we declare the new module in 
a header file: 

class Gaussian_basis : public Basis_f unction { 
public : 

//read the input 

void read(vector <string> & words) ; 

//the distance at which the function is zero 
double cutoff (); 

//The work functions. Given a distance, 

//getVal returns the values 

//and getLap returns the values, first 

//derivatives with respect to x,y, and z, 

//and the Laplacian 

void getVal(Arrayl <doublevar> & r, 

Arrayl <doublevar> & vals) ; 
void getLap (Arrayl <doublevar> & r, 

Array2 <doublevar> & vals) ; 

private : 

//put local variables here 
double alpha; 
double cut ; 

>; 

Then we define the new functions: 

Gaussian_basis : : read(vector <string> & words) { 
unsigned int pos=0; 

if (! readvalue (words , pos, alpha, "ALPHA")) 
errorC'Need ALPHA in gaussian basis"); 
const double m=le-18; 
cut=sqrt (-log(m/alpha) ) ; 

} 

Gaussian_basis :: cutoff () { 
return cut ; 

} 
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Gaussian_basis : :getVal(Arrayl <doublevar> & r, 

Arrayl <doublevar> & vals) -f/getLap is omitted for space reasons 

//The basis function module can represent several 

//functions, which are put into the vals array. The programmer then adds the source file to the Makefile 

/ /Here we only have one . and into a single if statement in the Basis junction, cpp 

I It is an array of form r,r~2,x,y,z file. The module can now be used anywhere another 

vals(0)=exp(-alpha*r(D) ; Basis_funtion can be used. All the modules follow this 

} basic procedure, just with different functions. 
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